Clustering for Control YFP

/home/pjb40/jupytervenv/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
scanpy==1.5.1 anndata==0.7.1 umap==0.3.10 numpy==1.16.5 scipy==1.4.1 pandas==1.0.1 scikit-learn==0.23.1 statsmodels==0.10.1 python-igraph==0.7.1 louvain==0.6.1
'/n/scratch3/groups/hsph/hbc/pjb40/scratch/TimeSeries_10X/data/velocyto_analysis/Only_controlN_Tumor/control_redo2'
AnnData object with n_obs × n_vars = 9199 × 1281 
    obs: 'DAY', 'batch', 'sample', 'n_counts', 'log_counts', 'n_genes', 'percent_mito', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'Clusters', '_X', '_Y', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'sample_batch', 'louvain_r0.01', 'louvain_r0.025', 'louvain_r0.05', 'louvain_r0.1', 'louvain_r0.2', 'louvain_r0.3', 'louvain_r0.4', 'louvain_r0.5', 'velocity_self_transition', 'velocity_length', 'velocity_confidence', 'velocity_confidence_transition', 'root_cells', 'end_points', 'velocity_pseudotime', 'AT1_marker_expr', 'AT2_marker_expr'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'velocity_gamma', 'velocity_r2', 'velocity_genes', 'spearmans_score', 'velocity_score'
    uns: 'DAY_colors', 'dendrogram_DAY', 'dendrogram_louvain_r0.2', 'dendrogram_louvain_r0.4', 'diffmap_evals', 'draw_graph', 'louvain', 'louvain_r0.01_colors', 'louvain_r0.025_colors', 'louvain_r0.05_colors', 'louvain_r0.1_colors', 'louvain_r0.2_colors', 'louvain_r0.2_sizes', 'louvain_r0.3_colors', 'louvain_r0.4_colors', 'louvain_r0.5_colors', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'rank_genes_r0.2', 'rank_genes_r0.4', 'rank_velocity_genes', 'sample_colors', 'umap', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_draw_graph_fa', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs'
    layers: 'Ms', 'Mu', 'ambiguous', 'counts', 'matrix', 'spliced', 'unspliced', 'variance_velocity', 'velocity'
    obsp: 'connectivities', 'distances'
plotting the louvain clusters with resolutions 0.4 
AnnData object with n_obs × n_vars = 9199 × 1281 
    obs: 'DAY', 'batch', 'sample', 'n_counts', 'log_counts', 'n_genes', 'percent_mito', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'Clusters', '_X', '_Y', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'sample_batch', 'louvain_r0.01', 'louvain_r0.025', 'louvain_r0.05', 'louvain_r0.1', 'louvain_r0.2', 'louvain_r0.3', 'louvain_r0.4', 'louvain_r0.5', 'velocity_self_transition', 'velocity_length', 'velocity_confidence', 'velocity_confidence_transition', 'root_cells', 'end_points', 'velocity_pseudotime', 'AT1_marker_expr', 'AT2_marker_expr'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'velocity_gamma', 'velocity_r2', 'velocity_genes', 'spearmans_score', 'velocity_score'
    uns: 'DAY_colors', 'dendrogram_DAY', 'dendrogram_louvain_r0.2', 'dendrogram_louvain_r0.4', 'diffmap_evals', 'draw_graph', 'louvain', 'louvain_r0.01_colors', 'louvain_r0.025_colors', 'louvain_r0.05_colors', 'louvain_r0.1_colors', 'louvain_r0.2_colors', 'louvain_r0.2_sizes', 'louvain_r0.3_colors', 'louvain_r0.4_colors', 'louvain_r0.5_colors', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'rank_genes_r0.2', 'rank_genes_r0.4', 'rank_velocity_genes', 'sample_colors', 'umap', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_draw_graph_fa', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs'
    layers: 'Ms', 'Mu', 'ambiguous', 'counts', 'matrix', 'spliced', 'unspliced', 'variance_velocity', 'velocity'
    obsp: 'connectivities', 'distances'

Remove Cluster 8, 9 : which are not important with respect to AT1 or AT2 markers. Kept Cluster 0 this time.

Subclustering

AnnData object with n_obs × n_vars = 8856 × 1281 
    obs: 'DAY', 'batch', 'sample', 'n_counts', 'log_counts', 'n_genes', 'percent_mito', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'Clusters', '_X', '_Y', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'sample_batch', 'louvain_r0.01', 'louvain_r0.025', 'louvain_r0.05', 'louvain_r0.1', 'louvain_r0.2', 'louvain_r0.3', 'louvain_r0.4', 'louvain_r0.5', 'velocity_self_transition', 'velocity_length', 'velocity_confidence', 'velocity_confidence_transition', 'root_cells', 'end_points', 'velocity_pseudotime', 'AT1_marker_expr', 'AT2_marker_expr'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'velocity_gamma', 'velocity_r2', 'velocity_genes', 'spearmans_score', 'velocity_score'
    uns: 'DAY_colors', 'dendrogram_DAY', 'dendrogram_louvain_r0.2', 'dendrogram_louvain_r0.4', 'diffmap_evals', 'draw_graph', 'louvain', 'louvain_r0.01_colors', 'louvain_r0.025_colors', 'louvain_r0.05_colors', 'louvain_r0.1_colors', 'louvain_r0.2_colors', 'louvain_r0.2_sizes', 'louvain_r0.3_colors', 'louvain_r0.4_colors', 'louvain_r0.5_colors', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'rank_genes_r0.2', 'rank_genes_r0.4', 'rank_velocity_genes', 'sample_colors', 'umap', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_draw_graph_fa', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs'
    layers: 'Ms', 'Mu', 'ambiguous', 'counts', 'matrix', 'spliced', 'unspliced', 'variance_velocity', 'velocity'
    obsp: 'connectivities', 'distances'

Recomputing the louvain clustering

Calculating PCA
computing PCA
    on highly variable genes
    with n_comps=50
    finished (0:00:06)
computing neighbors
    using 'X_pca' with n_pcs = 30
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:08)
computing UMAP
    finished: added
    'X_umap', UMAP coordinates (adata.obsm) (0:00:25)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 1 clusters and added
    'louvain__sub_r0.01', the cluster labels (adata.obs, categorical) (0:00:01)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 1 clusters and added
    'louvain_sub_r0.025', the cluster labels (adata.obs, categorical) (0:00:00)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 2 clusters and added
    'louvain_sub_r0.05', the cluster labels (adata.obs, categorical) (0:00:00)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 3 clusters and added
    'louvain_sub_r0.1', the cluster labels (adata.obs, categorical) (0:00:00)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 4 clusters and added
    'louvain_sub_r0.2', the cluster labels (adata.obs, categorical) (0:00:00)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 6 clusters and added
    'louvain_sub_r0.3', the cluster labels (adata.obs, categorical) (0:00:00)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 8 clusters and added
    'louvain_sub_r0.4', the cluster labels (adata.obs, categorical) (0:00:01)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 10 clusters and added
    'louvain_sub_r0.5', the cluster labels (adata.obs, categorical) (0:00:01)
computing UMAP
    finished: added
    'X_umap', UMAP coordinates (adata.obsm) (0:00:24)
plotting the louvain clusters with different resolutions

Likely :: Cluster 0,2,4,7 - AT1 and Cluster 3,1,5,6 - AT2

Finding markers for subclusters

Calculate marker genes in all clusters
ranking genes
    finished: added to `.uns['rank_genes_sub_r0.4']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:17)
    using 'X_pca' with n_pcs = 50
Storing dendrogram info using `.uns['dendrogram_louvain_sub_r0.4']`
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: 0, 1, 2, etc.
var_group_labels: 2, 3, 4, etc.
plotting rank genes in each cluster
top five genes in each cluster
0 1 2 3 4 5 6 7
0 Chil1 Clu Trf Ager Cks1b Ly6a Psca Cbr3
1 Scd1 Sat1 Epas1 Hopx Stmn1 Krt7 Taldo1 Gstm1
2 Sftpc Cldn4 Hp Emp2 H2afz Npm1 Esd Gclc
3 Elovl1 Ffar4 Npc2 Cldn18 Tubb5 S100a6 Fth1 Cd36
4 Fasn Anxa1 Lrg1 Akap5 Cdk1 Ybx1 Gsn Arg2
Epithelial cell markers
Looking at the p-value for all these genes
0_n 0_p 1_n 1_p 2_n 2_p 3_n 3_p 4_n 4_p 5_n 5_p 6_n 6_p 7_n 7_p
0 Chil1 0.0 Clu 0.000000e+00 Trf 9.707982e-277 Ager 0.0 Cks1b 0.000000e+00 Ly6a 0.0 Psca 1.700959e-247 Cbr3 1.570147e-202
1 Scd1 0.0 Sat1 1.433883e-275 Epas1 7.244492e-236 Hopx 0.0 Stmn1 0.000000e+00 Krt7 0.0 Taldo1 1.026653e-246 Gstm1 1.144926e-193
2 Sftpc 0.0 Cldn4 1.056621e-271 Hp 5.015688e-222 Emp2 0.0 H2afz 2.950516e-290 Npm1 0.0 Esd 5.071826e-234 Gclc 4.940402e-191
3 Elovl1 0.0 Ffar4 3.412896e-270 Npc2 3.490102e-205 Cldn18 0.0 Tubb5 1.956899e-278 S100a6 0.0 Fth1 1.508392e-231 Cd36 3.495845e-175
4 Fasn 0.0 Anxa1 9.910788e-269 Lrg1 4.291901e-201 Akap5 0.0 Cdk1 4.354380e-273 Ybx1 0.0 Gsn 3.796230e-230 Arg2 3.205687e-160
saving top genes for subset clusters

Estimate RNA velocity on subclusters

Filtered out 107 genes that are detected 20 counts (shared).
WARNING: Did not normalize X as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize spliced as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize unspliced as it looks processed already. To enforce normalization, set `enforce=True`.
Skip filtering by dispersion since number of variables are less than `n_top_genes`
WARNING: Did not normalize X as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize spliced as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize unspliced as it looks processed already. To enforce normalization, set `enforce=True`.
Skip filtering by dispersion since number of variables are less than `n_top_genes`
WARNING: Did not modify X as it looks preprocessed already.
computing neighbors
    finished (0:00:07) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:01) --> added 
    'Ms' and 'Mu', moments of spliced/unspliced abundances (adata.layers)
computing velocities
    finished (0:00:03) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph
    finished (0:00:33) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
computing velocity embedding
    finished (0:00:02) --> added
    'velocity_umap', embedded velocity vectors (adata.obsm)

The velocity projections are embedded on louvain clusters. Lets refer to the direction of the arrows based on clusters

Computes terminal states (root and end points).

The end points and root cells are obtained as stationary states of the velocity-inferred transition matrix and its transposed, respectively, which is given by left eigenvectors corresponding to an eigenvalue of 1, i.e.

computing terminal states
    identified 2 regions of root cells and 2 regions of end points 
    finished (0:00:00) --> added
    'root_cells', root cells of Markov diffusion process (adata.obs)
    'end_points', end points of Markov diffusion process (adata.obs)
AnnData object with n_obs × n_vars = 8856 × 1174 
    obs: 'DAY', 'batch', 'sample', 'n_counts', 'log_counts', 'n_genes', 'percent_mito', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'Clusters', '_X', '_Y', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'sample_batch', 'louvain_r0.01', 'louvain_r0.025', 'louvain_r0.05', 'louvain_r0.1', 'louvain_r0.2', 'louvain_r0.3', 'louvain_r0.4', 'louvain_r0.5', 'velocity_self_transition', 'velocity_length', 'velocity_confidence', 'velocity_confidence_transition', 'root_cells', 'end_points', 'velocity_pseudotime', 'AT1_marker_expr', 'AT2_marker_expr', 'louvain__sub_r0.01', 'louvain_sub_r0.025', 'louvain_sub_r0.05', 'louvain_sub_r0.1', 'louvain_sub_r0.2', 'louvain_sub_r0.3', 'louvain_sub_r0.4', 'louvain_sub_r0.5'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'velocity_gamma', 'velocity_r2', 'velocity_genes', 'spearmans_score', 'velocity_score'
    uns: 'DAY_colors', 'dendrogram_DAY', 'dendrogram_louvain_r0.2', 'dendrogram_louvain_r0.4', 'diffmap_evals', 'draw_graph', 'louvain', 'louvain_r0.01_colors', 'louvain_r0.025_colors', 'louvain_r0.05_colors', 'louvain_r0.1_colors', 'louvain_r0.2_colors', 'louvain_r0.2_sizes', 'louvain_r0.3_colors', 'louvain_r0.4_colors', 'louvain_r0.5_colors', 'neighbors', 'paga', 'pca', 'rank_genes_groups', 'rank_genes_r0.2', 'rank_genes_r0.4', 'rank_velocity_genes', 'sample_colors', 'umap', 'velocity_graph', 'velocity_graph_neg', 'velocity_params', 'louvain__sub_r0.01_colors', 'louvain_sub_r0.025_colors', 'louvain_sub_r0.05_colors', 'louvain_sub_r0.1_colors', 'louvain_sub_r0.2_colors', 'louvain_sub_r0.3_colors', 'louvain_sub_r0.4_colors', 'louvain_sub_r0.5_colors', 'rank_genes_sub_r0.4', 'dendrogram_louvain_sub_r0.4'
    obsm: 'X_diffmap', 'X_draw_graph_fa', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs'
    layers: 'Ms', 'Mu', 'ambiguous', 'counts', 'matrix', 'spliced', 'unspliced', 'variance_velocity', 'velocity'
    obsp: 'connectivities', 'distances'

Velocity projection by clusters

The most fine-grained resolution of the velocity vector field we get at single-cell level, with each arrow showing the direction and speed of movement of an individual cell.

projecting the velocity based on sample: cells which belongs to sample are colored by velocity arrows

AD10 sample i.e. control sample DAY 10 sample with yellow arrow clearly have two seperate velocity. Meaning, this sample has two cell type populations going in two directions.

Interprete the velocities/Phase plots

Please be advised, not to limit biological conclusions to the projected velocities, but to examine individual gene dynamics via phase portraits to understand how inferred directions are supported by particular genes.

interpret a spliced vs. unspliced phase portrait. Gene activity is orchestrated by transcriptional regulation. Transcriptional induction for a particular gene results in an increase of (newly transcribed) precursor unspliced mRNAs while, conversely, repression or absence of transcription results in a decrease of unspliced mRNAs. Spliced mRNAs is produced from unspliced mRNA and follows the same trend with a time lag. Time is a hidden/latent variable. Thus, the dynamics needs to be inferred from what is actually measured: spliced and unspliced mRNAs as displayed in the phase portrait.

examine the phase portraits of some marker genes, visualized with scv.pl.velocity(adata, gene_names) or scv.pl.scatter(adata, gene_names).

Looks like stem cell genes ; which are known earlier does not explain velocity.

The black line corresponds to the estimated ‘steady-state’ ratio, i.e. the ratio of unspliced to spliced mRNA abundance which is in a constant transcriptional state. RNA velocity for a particular gene is determined as the residual, i.e. how much an observation deviates from that steady-state line.

Positive velocity indicates that a gene is up-regulated, which occurs for cells that show higher abundance of unspliced mRNA for that gene than expected in steady state. Conversely, negative velocity indicates that a gene is down-regulated.

Find genes that explains the directionality in the up-regulated and down-regulated genes.

Phase portrait for Fn1 gene explains the velocity nicely.

now lets look at some of the genes that came on top of the list in all four clusters.

  • Chil1 Clu Trf Ager Cks1b Ly6a Psca Cbr3
  • Scd1 Sat1 Epas1 Hopx Stmn1 Krt7 Taldo1 Gstm1
  • Sftpc Cldn4 Hp Emp2 H2afz Npm1 Esd Gclc
  • Elovl1 Ffar4 Npc2 Cldn18 Tubb5 S100a6 Fth1 Cd36
  • Fasn Anxa1 Lrg1 Akap5 Cdk1 Ybx1 Gsn Arg2
AAACCCACAAGTTCGT    1
AAACCCAGTTGGAGGT    6
AAACGAAGTGGACTGA    1
AAACGAAGTTATAGCC    6
AAAGAACAGCTCATAC    1
                   ..
TTTGTTGCACGACTAT    0
TTTGTTGCAGTGACCC    2
TTTGTTGGTATCGGTT    0
TTTGTTGTCGATTGAC    2
TTTGTTGTCTTACTGT    4
Name: louvain_sub_r0.4, Length: 8856, dtype: category
Categories (8, object): [0, 1, 2, 3, 4, 5, 6, 7]

Expression of S100a6 looks interesting it has highly unspliced cells and it falls in middle of DAY 7.

      0_n  0_p    1_n            1_p    2_n            2_p     3_n  3_p  \
0   Chil1  0.0    Clu   0.000000e+00    Trf  9.707982e-277    Ager  0.0   
1    Scd1  0.0   Sat1  1.433883e-275  Epas1  7.244492e-236    Hopx  0.0   
2   Sftpc  0.0  Cldn4  1.056621e-271     Hp  5.015688e-222    Emp2  0.0   
3  Elovl1  0.0  Ffar4  3.412896e-270   Npc2  3.490102e-205  Cldn18  0.0   
4    Fasn  0.0  Anxa1  9.910788e-269   Lrg1  4.291901e-201   Akap5  0.0   

     4_n            4_p     5_n  5_p     6_n            6_p    7_n  \
0  Cks1b   0.000000e+00    Ly6a  0.0    Psca  1.700959e-247   Cbr3   
1  Stmn1   0.000000e+00    Krt7  0.0  Taldo1  1.026653e-246  Gstm1   
2  H2afz  2.950516e-290    Npm1  0.0     Esd  5.071826e-234   Gclc   
3  Tubb5  1.956899e-278  S100a6  0.0    Fth1  1.508392e-231   Cd36   
4   Cdk1  4.354380e-273    Ybx1  0.0     Gsn  3.796230e-230   Arg2   

             7_p  
0  1.570147e-202  
1  1.144926e-193  
2  4.940402e-191  
3  3.495845e-175  
4  3.205687e-160  

look at cluster 2 and cluster 5, which has Trf and Krt7 gene expression in all cells

<Figure size 900x0 with 0 Axes>

Identify important genes through velocity

We need a systematic way to identify genes that may help explain the resulting vector field and inferred lineages. To do so, we can test which genes have cluster-specific differential velocity expression, being siginificantly higher/lower compared to the remaining population. we will runs a differential velocity t-test and outpus a gene ranking for each cluster.

ranking velocity genes
    finished (0:00:03) --> added 
    'rank_velocity_genes', sorted scores by group ids (adata.uns) 
    'spearmans_score', spearmans correlation scores (adata.var)
0 1 2 3 4 5 6 7 8 9
0 Smad7 Eps8l2 Efemp1 Zdhhc14 Blvrb Ptgr1 Foxp2 Areg Dlc1 Dlc1
1 Epb41l2 Dclk1 Ect2 Reps2 Gclc Cpm Atp6v0a4 Krt19 Ccdc146 Tspan8
2 Emp1 Pde7a Nfkbia Pdzd2 Qsox1 Mtus1 Dlc1 Zfp36l1 Atp6v0a4 Ccdc146
3 Adamts5 Cdkn2b Osbpl3 Phactr1 Poc1a Atp7b Pde1c Inhba Plcb4 Plcb4
4 F2r Gss Cav1 Arhgef28 Tjp1 Foxn3 Plcb4 Ybx3 Rbms3 C3
0      Smad7
1    Epb41l2
2       Emp1
3    Adamts5
4        F2r
Name: 0, dtype: object

kwargs = dict(frameon=False, size=10, linewidth=1.5, add_outline='0, 1, 2, 3, 4, 5, 6, 7, 8')

scv.pl.scatter(cdata_subset, df['0'][:5], ylabel='0', kwargs) scv.pl.scatter(cdata_subset, df['0'][:5], ylabel='0', kwargs)

Speed and coherence of velocity

Two more useful stats: - The speed or rate of differentiation is given by the length of the velocity vector. - The coherence of the vector field (i.e., how a velocity vector correlates with its neighboring velocities) provides a measure of confidence.

--> added 'velocity_length' (adata.obs)
--> added 'velocity_confidence' (adata.obs)

These provide insights where cells differentiate at a slower/faster pace, and where the direction is un-/determined.

Clusters 0 1 2 3 4 5 6 7 8 9
velocity_length 14.508276 14.358244 14.533466 14.624949 14.708447 14.591269 14.705872 14.626640 15.067565 14.635484
velocity_confidence 0.904226 0.904572 0.903691 0.901902 0.907460 0.905150 0.911922 0.910500 0.924264 0.916922

Velocity graph and pseudotime

We can visualize the velocity graph to portray all velocity-inferred cell-to-cell connections/transitions.

the graph can be used to draw descendents/anscestors coming from a specified cell. Highlighted cells can be traced back to its potential fate.

Finally, based on the velocity graph, a velocity pseudotime can be computed. After inferring a distribution over root cells from the graph, it measures the average number of steps it takes to reach a cell after walking along the graph starting from the root cells.

Contrarily to diffusion pseudotime, it implicitly infers the root cells and is based on the directed velocity graph instead of the similarity-based diffusion kernel.

Trying dynamic mode in velocity

recovering dynamics
    finished (0:22:46) --> added 
    'fit_pars', fitted parameters for splicing dynamics (adata.var)
computing velocities
    finished (0:00:12) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph
    finished (0:00:14) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
computing latent time using root_cells as prior
    finished (0:00:04) --> added 
    'latent_time', shared time (adata.obs)
Index(['Acoxl', 'Napsa', 'Sfta2', 'Elovl1', 'Tmem243', 'Rbms3', 'Hc', 'Lrp2',
       'Car8', 'Kif11',
       ...
       'Rad51', 'Rtn4', 'Rassf8', 'Slc26a4', 'Cdc42ep5', 'Gclm', 'Taldo1',
       '5330417C22Rik', 'Cd200', 'Crlf1'],
      dtype='object', length=300)

IGFBP7 and Napsa is clearly a marker for cells in transition.

ranking velocity genes
    finished (0:00:02) --> added 
    'rank_velocity_genes', sorted scores by group ids (adata.uns) 
    'spearmans_score', spearmans correlation scores (adata.var)
0 1 2 3 4
0 Serpinf1 Cd55 Ptgs1 Rtkn2 Adam19
1 Epha2 Ptpn14 Tmem243 Sema3a Sorcs2
2 Cd36 Kank3 Ctsl Zdhhc14 St3gal4
3 Qk Por Soat1 Pdzd2 Slc26a4
4 Aox3 Epb41l4a Ces1d Pde8b St3gal5

Saving the data

STOP HERE AND DO NOT RUN BELOW. YOU CAN LOAD SAVE